0 Summary on Common Distributions

Name Distribution E Var MGF Char[1]
Bernoulli(p) P(X=1)=p,P(X=0)=1p p p(1p) 1p+pet 1p+peit
Binomial(n,p) P(Sn=k)=(nk)pk(1p)nk np np(1p) (1p+pet)n (1p+peit)n
Multinomial
Geometric(p) P(W=k)=(1p)k1p,kN 1p 1pp2 pet1(1p)et peit1(1p)eit
NB(r,p)[2] P(Fr=k)=(r+k1k)pr(1p)k r(1p)p r(1p)p2 (pet1(1p)et)r (p1eit+peit)r
Hypergeom(N,B,n)[3] P(X=k)=(Bk)(NBnk)(Nn) nBN nBN(1BN)(Nnn1)

Poisson(λ)
P(X=k)=λkk!eλ λ λ eλ(et1) eλ(eit1)
Uniform(a,b) f(x)=1ba1[a,b] b+a2 (ba)212 etbetat(ba) eitbeitait(ba)

Laplace(μ,b)
f(x)=12bexp(xμb) μ 2b2 etμ1b2t2 eitμ1+b2t2
N(μ,σ2) f(x)=12πσe(xμ)22σ2 μ σ2 etμ+12σ2t2 eitμ12σ2t2
χk2 (12t)k2 (12it)k2
Gamma(α,β) f(x)=βαΓ(α)xα1eβx,x>0 αβ αβ2 (1tβ)α (1itβ)α
Exp(λ) λeλx,x0 1λ 1λ2 (1tλ1)1 (1itλ1)1
Beta(α,β) f(x)=Γ(α,β)Γ(α)Γ(β)xα1(1x)β1 αα+β αβ(α+β)2(α+β+1) 1+k=1(r=0k1α+rα+β+r)tkk! 1F1(α;α+β;it)
N(μ,Σ) μ Σ etT(μ+12Σt) etT(iμ12Σt)

1 Preliminary

1.1 Generalized Binomial Coefficient

For aC and any integer k0, the generalized binomial coefficient is defined as (ak)=a(a1)(a2)(ak+1)k!.
Specially, if a=m where mN, then (mk)=(1)k(m+k1k).

1.2 Newton Binomial Theorem

Let aC and xC with |x|<1, then

(1+x)a=k=0(ak)xk.

2 Bernoulli Trial and Bernoulli distribution

A Bernoulli trial is a random experiment having 2 possible outcomes commonly labeled as success and failure. If X is the indicator of success, then X follows a Bernoulli distribution or XBernoulli(p) if P[X=1]=p,P[X=0]=1p.

2.1 Expectation and variation of Bernoulli distribution

Assume that XBernoulli(p), therefore E[X]=pE[X2]=pVar[X]=E[X2]E[X]2=p(1p).

3 Binomial Distribution

3.1 Introduction of binomial distribution

Assume that you are testing the toys manufactured by a factory, where the probability that a toy is defective is p. In order to decide whether to accept the toys, you randomly sample n of them from the whole batch. Let X denotes the number of broken toys (0Xn). Then X follows a binomial distribution with parameters n and p XBin(n,p). The pmf of X is P[X=k]=(nk)pk(1p)nkk{0,1,,n}.

Tip

We can consider Binomial distribution as n independent Bernoulli trials and the random variable XBin(n,p) denotes the number of successes in n independent Bernoulli trials: X=ξ1+ξ2++ξn,ξiBernoulli(p).

3.2 Expectation and Variation of binomial distribution

  1. We can directly compute the expectation and variation by calculating the first and second moment of X. E[X]=k=0nk(nK)pk(1p)nk=k=1nk(nK)pk(1p)nk=npk=1n(n1K1)pk1(1p)(n1)(k1)=npt=0n1(n1T)pt(1p)n1t=np.
E[X2]=k=0nk2(nK)pk(1p)nk=k=1nk(k1)(nK)pk(1p)nk+k=1nk(nK)pk(1p)nk=k=2nk(k1)(nK)pk(1p)nk+E[X]=p2n(n1)k=2n(n2K2)pk2(1p)(n2)(k2)+E[X]=p2n(n1)t=0n2(n2T)pk2(1p)k2t+E[X]=p2n(n1)+np.

therefore E[X]=npVar[X]=E[X2]E[X]2=np(1p).
2. Let ξi denote the indicator variable of success of the ith Bernoulli trial. Then the number of successes can be expressed as X=ξ1++ξn. Using the expectation and variation of Bernoulli distribution and linearity of expectation and variation for independent random variables. We get E[X]=E[ξ1++ξn]=i=1nE[ξi]=npVar[X]=Var[ξ1++ξn]=i=1nVar[ξi]=np(1p).

4 Geometric Distribution

4.1 Pmf of Geometric Distribution

The geometric distribution gives the probability that the first occurrence of success requires k independent trials, each with probability p. Assume a random variable XGeo(p), then P[X=k]=(1p)k1p,k=1,2,3,.

4.2 Expectation of Geometric Distribution

Here we use an interesting trick to solve E[X] where XGeo(p). E[X]=k=1k(1p)k1p=pk=1[(q)k]=p(q1q)=1p.

4.3 Variation of Geometric Distribution

4.4 Sum of Geometric-distributed random variables

Assume we have n random variables X1,,Xn satisfying XiGeo (p), i=1,,n. From the definition of Geometric distribution we know that it is about the number of trials when the first success occurs. Then Y=X1+X2++Xn indicates the number of trials when the nth success occurs. Therefore the last trial has succeeded and the remaining trials have n1 successes. Thus P[Y=k]=p(k1n1)pn1(1p)kn=(k1n1)pn(1p)kn.We call this distribution Negative Binomial Distribution.

5 Pascal Distribution

In n independent Bernoulli trials with success probability p, the pascal distribution focuses on the number of failures when the rth success happens. If a random variable XPascal(r,p), then P[X=k]=(k+r1k)pr(1p)k,k=0,1,2,.

5.1 Expectation and variation of Pascal distribution

If XPascal(r,p), then E[X]=r(1p)p,Var[X]=r(1p)p2.

6 Negative Binomial Distribution

In n independent Bernoulli trials with success probability p, the negative binomial distribution focuses on the number of trials when the rth success happens. If a random variable XNB(r,p), then P[X=k]=(k1r1)pr(1p)kr,k=r,r+1,.From the above discussion about geometric distribution, we have X=X1+X2+Xr where XiGeo(p)(i[r]), where Xi denotes the number of trials needed after the (i1)st success to obtain the ith success. (X1,,Xr are independent)

6.1 Expectation and variation of negative binomial distribution

E[X]=k=rk(k1R1)pr(1p)kr

7 Gaussian Distribution

7.1 Pdf of Gaussian Distribution

If a random variable XN(μ,σ2), where μ is the mean and σ2 is the variance, then the pdf of X is f(x)=12πσexp{(xμ)22σ2}.

7.2 Mgf of Gaussian Distribution

Assume a random variable XN(μ,σ2), then the mgf of X is MX(t)=E[etX]=exp(μt+12σ2t2).

7.3 Characteristic function of Gaussian Variables

Since we know the mgf of Gaussian variable XN(μ,σ) is MX(t)=E[etX]=exp(μt+12σ2t2),therefore the characteristic function of X is equal to φX(t)=E[eitX]=MX(it)=exp(iμt12σ2t2).

7.4 Moments of standard Gaussian

Assume that the random variable XN(0,1) and that kN, then E[Xk]={0k odd(k1)!!k even.
First we state an important lemma:

Relation between Moments and derivatives of mgf

MX(n)(0)=E(Xn)

Since we have the mgf of Gaussian variables, moments can be obtained via differentiation. We also provide a direct proof for the moments of the standard Gaussian.

If k is odd, then it's obvious that E[Xk]=0, so the following process assumes that k is even. Then 12πxkex22dx=2π0xkex22dx=2π0tk12et2dt=2π12Γ(k+12)(12)(k+1)/2=12π2k+12(k1)!!2k2Γ(12)=(k1)!!(Recall that Γ(12)=π).

7.5 An important property of Standard Gaussian and Mills Ratio

A standard Gaussian variable XN(0,1) has a good property in terms of derivatives that ϕ(x)=xϕ(x).

7.6 Expectation of absolute value of Gaussian variables

If a random variable ZN(0,1), E[|Z|]=2π.

7.7 Rotational Invariance of Gaussian Variables

If a matrix RRn×n is orthogonal, indicating that RR=RR=In, given a random vector XN(0,σ2In), it satisfies RX=dY.This can be proved using linear transformation of multivariate Gaussian variables.

8 Multivariate Gaussian Distribution

8.1 Definition of Standard Normal Random Vector

A real random vector X=(X1,,Xk) is called a standard normal random vector if all of its components Xi (i[k]) are independent standard Gaussian variables. We denote it XN(0,In) where the mean vector is 0 and the covariance matrix is In.

8.2 Definition of Normal random vector

A real random vector X=(X1,,Xk) is called a normal random vector if there exists a random normal random vector ZRl, a k×l matrix A and a k-dim vector μ, such that X=AZ+μ. (Wikipedia) We denote it XN(μ,Σ), where μ is the mean vector and Σ is the covariance matrix with μ=(μ1μ2μk),Σ=(σ12ρ12σ1σ2ρ1kσ1σkρ21σ2σ1σ22ρ2kσ2σkρk1σkσ1ρk2σkσ2σk2),where Σij is the covariance between Xi and Xj (i,j[k]) and μi is the mean of Xi (i[k]).

8.3 Joint pdf of multivariate Gaussian distribution

Assume that XRkN(μ,Σ), then fX(x1,,xk)=1(2π)k|Σ|exp{12(Xμ)Σ1(Xμ)}.

8.4 Characteristic function of multivariate Gaussian variables

If XRkN(μ,Σ), then φX(t)=E[eitX]=exp{itμ12tΣt},tRk.

8.5 Linear Transformation of multivariate Gaussian variables

If XN(μ,Σ) and Y=α+AX, then E[Y]=Aμ+αVar[Y]=AΣA.

9 Chi-squared Distribution

9.1 Definition of Chi-squared Distribution

Assume that we have n i.i.d. samples X1,,Xn from N(0,1). Then i=1nXi2χn2,which is called Chi-squared Distribution with n degrees of freedom.

9.2 Pdf of Chi-squared Distribution

From the first part, if XN(0,1), then Y=X2χ12. Then FY(y)=P[X2y]=P[yXy]=yy12πet22dt.Let G(t) denote the primitive function of 12πexp{t22}. Therefore fY(y)=FY(y)=(G(y)G(y))=12πey21y(1)=(12)1/2Γ(12)y121e12ywhere (1) is because Γ(12)=π.

pdf of χ12

fY(y)=12πey21y,y0

9.3 Relation to Gamma Distribution

In the derivation of the pdf of Chi-squared Distribution we can conclude that χ12Γ(12,12),which is a special case of Gamma distribution. Therefore, if a random variable Xχn2, then XΓ(n2,12).

9.4 Expectation of Chi-squared Distribution

Given that if a random variable YΓ(α,β), then E[Y]=αβ, therefore for a random variable Xχn2, E[X]=n.

9.5 Variation of Chi-squared Distribution

Given that if a random variable YΓ(α,β), then Var[Y]=αβ2, therefore for a random variable Xχn2, Var[X]=2n.

9.6 Mgf of Chi-squared Distribution

Given that for a Gamma distributed random variable XΓ(α,β), the mgf is MX(t)=1(1tβ)α,since Yχn2 is equivalent to YΓ(n2,12), therefore MY(t)=1(12t)n2.

9.7 Characteristic function of Chi-squared Distribution

Since for Yχn2, MY(t)=(12t)n2, therefore φY(t)=MY(it)=(12it)n2.

9.8 Asymptotic Property of Chi-squared Distribution

9.8.1 LLN

From the definition of Chi-squared Distribution we have Y=i=1nXi2χn2,XiN(0,1).If we treat every Xi2 as an i.i.d. sample from the same distribution with mean E[Xi2]=1. By the law of large number we can derive i=1nXi2nPE[Xi2]=1,thus YnP1,Yχn2.

9.8.2 CLT

Similar to the LLN part, since we treated every Xi2 as an i.i.d. sample, therefore we can apply Central Limit Theorem to get convergence in distribution:Yn2nLN(0,1),Yχn2.

10 Exponential Distribution

10.1 Pdf of Exponential Distribution

If a random variable XExp(λ), then the pdf of X is f(x)={λeλxx>00elsewhere.

10.2 The tail probability of Exponential Distribution

If XExp(λ) with λ>0, then the tail probability is P[Xt]=eλt.

10.3 Memoryless Property

Memoryless property of Exponential distributed variable

P[X>s+tX>s]=P[X>t].

Since P[X>s+tX>s]=P[X>s+t]P[X>s]=eλ(s+t)/eλs=eλt.
Changing the expression P[X>s+tX>s] to P[Xs>tX>s] we can conclude that XsX>s and X have the same distribution.

10.4 Relation to Gamma Distribution

Exponential Distribution is a special form of Gamma Distribution, using shape-rate version we can find that if XΓ(1,α):f(x)=αΓ(1)x11eαx=αeαx,x>0.Therefore XExp(λ)XΓ(1,λ).

11 Poisson Distribution

11.1 Pmf of Poisson Distribution

Assume random variable XPoisson(λ), the pmf of X is given as follows:P[X=k]=eλλkk!λ>0,k=0,1,2,.

11.2 Expectation and variance of Poisson Distribution

First we have an important property of Poisson Distribution:Assume XPoisson(λ), then E[X(X1)(Xk)]=λk+1,k=0,1,2,Using this property we can quickly get that E[X]=λE[X2]=λ2+λ,therefore Var[X]=E[X2]E[X]2=λ.

11.3 Mgf of Poisson Distribution

If a random variable XPoisson(λ) where λ>0, then the mgf of X is MX(t)=exp{λ(et1)}.

11.4 Reproducibility of Poisson distribution

If X1,,Xn are independent variables satisfying XiPoisson(λi), then i=1nXiPoisson(i=1nλi).

11.5 Poisson approximation to the Binomial distribution

For n Bernoulli trials with probability p, if n is large enough and p is small enough, then the Binomial distribution is approximately equal to Poisson distribution with parameter λ=np.

12 Gamma Distribution

12.1 Γ function

The Gamma function Γ() is defined as follows:

Γ(α)=0yα1eydy,

An integration by parts shows thatΓ(α)=(α1)0yα2eydy=(α1)Γ(α1)Given that Γ(1)=0eydy=1Thus if α is a positive integer greater than 1, Γ(α)=(α1)!

Recursive property of Gamma function

For all x>0, the Gamma function satisfies the following recursion:Γ(x+1)=xΓ(x),x>0.

12.2 Γ(α,β) Distribution (shape-scale version)

We say that the continuous random variable X has a Γdistribution with parameters α>0 and β>0 if its pdf is f(x)={1Γ(α)βαxα1ex/β0<x<0elsewhere,we often write that Γ(α,β) distribution where α is the shape parameter and β is the scale parameter.

12.3 Γ(α,β) Distribution (shape-rate version)

We say that the continuous random variable X has a Γdistribution with parameters α>0 and β>0 if its pdf is f(x)={βαΓ(α)xα1eβx0<x<0elsewhere,we often write that Γ(α,β) distribution where α is the shape parameter and β is the rate parameter. Throughout this article we use this version of Gamma distribution.

12.4 Expectation of Γ(α,β) Distribution

We can use the definition of the Gamma function to simplify the computation of the integral:E[X]=0βαΓ(α)xα1eβxxdx=0βαΓ(α)xαeβxdx=βαΓ(α)Γ(α+1)βα+10βα+1Γ(α+1)xαeβxdx=αβ.

12.5 Variation of Γ(α,β) Distribution

Similarly, we can use the definition of the Gamma function to simplify the computation of the integral:E[X2]=0βαΓ(α)xα1eβxx2dx=0βαΓ(α)xα+1eβxdx=βαΓ(α)Γ(α+2)βα+20βα+2Γ(α+2)xα+1eβxdx=α(α+1)β2.Therefore Var[X2]=E[X2](E[X])2=αβ2.

12.6 Mgf of Γ(α,β) distribution

MX(t)=0etxβαΓ(α)xα1eβxdx=0βαΓ(α)xα1e(βt)xdx=(11tβ)α.

12.6.1 Additivity property of the Gamma Distribution

If XΓ(α1,θ) and YΓ(α2,θ) are independent, then X+YΓ(α1+α2,θ).

This can be proved using mgf of Gamma distribution.

13 Inverse Gamma Distribution

13.1 Pdf of Inverse Gamma Distribution

If a random variable YIG(α,β), then f(y)=βαΓ(α)y(α+1)eβ/y,y>0.

14 Beta Distribution

14.1 Beta Function

Beta function is defined by integralB(r1,r2)=01tr11(1t)r21dt.

Association between Beta function and Gamma function

B(α,β)=Γ(α)Γ(β)Γ(α+β).

Apply the variable substitution by letting x=st, y=s(1t), then the above integral is equivalent to 001es(st)α1(s(1t))β1sdtds=0essα+β1ds01tα1(1t)β1dt=Γ(α+β)B(α,β),thus we have finished the proof.

Another expression of Beta function

We can get another version of Beta function using variable substitution t=x1+x,x(0,). Then the integral is equivalent to B(r1,r2)=0(x1+x)r11(11+x)r211(1+x)2dx=0xr11(11+x)r1+r2dx=0xr11(1+x)r1r2dx.

14.2 Pdf of beta distribution

The beta distribution beta(α,β) is a two-parameter distribution with range [0,1] and pdf f(x)=(α+β1)!(α1)!(β1)!xα1(1x)β1(α>0,β>0)=1B(α,β)xα1(1x)β1.

14.3 Expectation of beta distribution

We can use the pdf of beta distribution to get the expectation of a beta distribution random variable easily, as the computation that follows:E[X]=011B(α,β)xα1(1x)β1xdx=011B(α,β)xα(1x)β1dx=B(α+1,β)B(α,β)011B(α+1,β)xα(1x)β1dx=B(α+1,β)B(α,β)=αα+β.

14.4 Variation of beta distribution

Similarly, we can use the pdf of beta distribution to derive E[X2], using Var[X]=E[X2]E[X]2 we can get the variation.E[X2]=B(α+2,β)B(α,β), Var[X]=E[X2]E[X]2=B(α+2,β)B(α,β)(B(α+1,β)B(α,β))2=αβ(α+β)2(α+β+1).

14.5 Relation to Chi-squared Distribution

Let X and Y are independent variables and satisfy Xχn2,Yχm2. Then we have XX+YB(n2,m2).

15 Cauchy Distribution

15.1 Pdf of Cauchy Distribution

A random variable X is said to follow a Cauchy Distribution with location parameter θR and scale parameter γ>0 if f(x;θ,γ)=1πγ[1+(xθγ)2],xR.Therefore XCauchy(θ,γ).

A special property of Cauchy Distribution

The expectation of Cauchy Distribution doesn't exist.

15.2 Median of Cauchy Distribution

15.3 Characteristic function of Cauchy distribution

16 T Distribution

16.1 Definition

Assume that XN(0,1) and Yχn2 are independent. Then the statistic

T=XY/n

Is said to follow a t-distribution with n degrees of freedom.

16.2 Pdf of t-distribution

For a random variable Ttn, the density function of T is given by fT(t)=Γ(n+12)Γ(n2)πn(t2n+1)n+12.

17 F Distribution

17.1 Definition

Assume that Xχn2 and Yχm2 are independent. Then the statisticF=X/nY/mFn,mis said to follow a F-distribution with parameters n and m.

17.2 Pdf of F-distribution

From the definition we have F=X/nY/m, this is an example of the probability density function of the ratio of two Chi-squared distributed variables. The pdf of FFn,m is given bellow:fF(f)=Γ(m+n2)Γ(m2)Γ(n2)(nm)n2fn21(nmf+1)m+n2.

17.3 Property of F-distribution

Fn,m(1α)=1Fm,n(α).

Thus we have finished the proof.

17.4 Expectation and Variation of F-distribution

If a random variable XFn,m, then E[X]=mm2,Var[X]=2m2(n+m2)n(m2)2(m4).

For variation of F-distribution we can use similar calculation as its expectation using the second version of Beta function.


  1. Characteristic function ↩︎

  2. NegativeBinomial ↩︎

  3. HyperGeometric ↩︎